Semantic v.s. Positions: Utilizing Balanced Proximity in Language Model Smoothing for Information Retrieval
نویسندگان
چکیده
Work on information retrieval has shown that language model smoothing leads to more accurate estimation of document models and hence is crucial for achieving good retrieval performance. Several smoothing methods have been proposed in the literature, using either semantic or positional information. In this paper, we propose a unified proximity-based framework to smooth language models, leveraging semantic and positional information simultaneously in combination. The key idea is to project terms to positions where they originally do not exist (i.e., zero count), which is actually a word count propagation process. We achieve this projection through two proximity-based density functions indicating semantic association and positional adjacency. We balance the effects of semantic and positional smoothing, and score a document based on the smoothed language model. Experiments on four standard TREC test collections show that our smoothing model is effective for information retrieval and generally performs better than the state of the art.
منابع مشابه
Structure et proximité pour la recherche documentaire
Our study compares the effectiveness of an information retrieval system based on the proximity of the query term occurrences in the documents and an IRS based on a language model with Dirichlet smoothing and with the Okapi BM25 model. Our proximity based model computes at each position in the document a value much higher as some occurrences of all the query terms are close to this position. Mor...
متن کاملSemantics-based Language Models for Information Retrieval and Text Mining
Semantics-based Language Models for Information Retrieval and Text Mining Xiaohua Zhou Xiaohua Hu The language modeling approach centers on the issue of estimating an accurate model by choosing appropriate language models as well as smoothing techniques. In the thesis, we propose a novel context-sensitive semantic smoothing method referred to as a topic signature language model. It extracts exp...
متن کاملComputing Term Translation Probabilities with Generalized Latent Semantic Analysis
Term translation probabilities proved an effective method of semantic smoothing in the language modelling approach to information retrieval tasks. In this paper, we use Generalized Latent Semantic Analysis to compute semantically motivated term and document vectors. The normalized cosine similarity between the term vectors is used as term translation probability in the language modelling framew...
متن کاملDeveloping a BIM-based Spatial Ontology for Semantic Querying of 3D Property Information
With the growing dominance of complex and multi-level urban structures, current cadastral systems, which are often developed based on 2D representations, are not capable of providing unambiguous spatial information about urban properties. Therefore, the concept of 3D cadastre is proposed to support 3D digital representation of land and properties and facilitate the communication of legal owners...
متن کاملInformation Retrieval by Semantic Similarity
Semantic Similarity relates to computing the similarity between conceptually similar but not necessarily lexically similar terms. Typically, semantic similarity is computed by mapping terms to an ontology and by examining their relationships in that ontology. We investigate approaches to computing the semantic similarity between natural language terms (using WordNet as the underlying reference ...
متن کامل